Skip to content

Fix logps#86

Merged
tastelikefeet merged 15 commits intomodelscope:mainfrom
tastelikefeet:fix/0301-1
Mar 1, 2026
Merged

Fix logps#86
tastelikefeet merged 15 commits intomodelscope:mainfrom
tastelikefeet:fix/0301-1

Conversation

@tastelikefeet
Copy link
Copy Markdown
Collaborator

PR type

  • Bug Fix
  • New Feature
  • Document Updates
  • More Models or Datasets Support

PR information

Write the detail information belongs to this PR.

Experiment results

Paste your experiment result here(if needed).

@gemini-code-assist
Copy link
Copy Markdown
Contributor

Summary of Changes

Hello, I'm Gemini Code Assist1! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed!

This pull request refines the twinkle framework by standardizing the handling of log probabilities (logps) and loss outputs across its core components. It introduces a dedicated LossOutput structure and integrates logps into the ModelOutput, ensuring that model implementations and loss functions consistently process and return these values. Concurrently, the PR updates examples and documentation to reflect the latest Qwen model IDs and provides comprehensive guides for diverse training environments, from single-GPU setups to distributed Ray and remote training scenarios. These changes aim to improve the framework's robustness, clarity, and support for advanced training techniques.

Highlights

  • Log Probability (logps) Handling: Introduced explicit handling and propagation of log probabilities (logps) throughout the framework, including adding a 'logps' field to ModelOutput and refactoring loss functions to return a structured LossOutput object.
  • Model ID Updates: Updated numerous examples and documentation across English and Chinese versions to use newer Qwen model IDs (e.g., Qwen/Qwen3-4B, Qwen/Qwen3.5-35B-A3B) instead of older Qwen2.5 models.
  • Loss Function Refactoring: Modified all core loss functions (ChunkedCrossEntropyLoss, CrossEntropyLoss, GRPOLoss, MSELoss, VocabParallelCrossEntropyLoss) to return a LossOutput dictionary containing both the loss value and the number of tokens, improving consistency and data flow.
  • Model Forward/Backward Enhancements: Enhanced MegatronModel and TransformersModel to correctly compute and pass logps during forward and backward passes, ensuring these values are available for downstream components like loss calculations and Tinker API compatibility.
  • Documentation Expansion: Significantly expanded the 'Quick Start' guides (English and Chinese) with detailed sections and code examples for various usage patterns, including partial component usage, single GPU, torchrun, Ray training, and remote training setups.
  • Tinker API Compatibility Improvements: Updated the Tinker API compatibility layer to correctly process and return logps and logits from model outputs, and ensured input tensors are moved to CPU before conversion to Datum types.

🧠 New Feature in Public Preview: You can now enable Memory to help Gemini Code Assist learn from your team's feedback. This makes future code reviews more consistent and personalized to your project's style. Click here to enable Memory in your admin console.

Changelog
  • README.md
    • Added introductory text for sample code capabilities.
    • Updated base_model from Qwen2.5-7B-Instruct to Qwen3-4B in Ray training example.
    • Replaced base_url and api_key with placeholders and updated SelfCognitionProcessor team name in Tinker API example.
  • README_ZH.md
    • Added a new paragraph explaining Twinkle's algorithm interface flexibility.
    • Updated the changelog entry for the initial release.
    • Modified the note regarding the Tinker API compatible server.
    • Added introductory text for sample code capabilities.
    • Updated base_model from Qwen2.5-7B-Instruct to Qwen3-4B in Ray training example.
    • Updated section title for Tinker API to include '实现无服务器式训练'.
    • Replaced base_url and api_key with placeholders and updated SelfCognitionProcessor team name in Tinker API example.
  • cookbook/megatron/tp.py
    • Imported input_feature_to_datum and TwinkleCompatModelBase.
    • Updated model_id from Qwen2.5-7B-Instruct to Qwen3-4B in evaluation and training datasets and model initialization.
    • Added lines to calculate _inputs and _temp using input_feature_to_datum and TwinkleCompatModelBase._get_forward_output.
  • cookbook/megatron/tp_moe.py
    • Updated model_id from Qwen3-30B-A3B-Instruct-2507 to Qwen3.5-35B-A3B in evaluation and training datasets and model initialization.
    • Modified model.save call to include merge_lora=True.
  • cookbook/ray/single_controller.py
    • Updated model_id from Qwen2.5-7B-Instruct to Qwen3.5-35B-A3B in evaluation dataset.
    • Updated model_id from Qwen2.5-7B-Instruct to Qwen3-4B in training dataset and model initialization.
  • cookbook/rl/grpo.py
    • Updated MODEL_ID from Qwen2.5-3B-Instruct to Qwen3-4B.
  • cookbook/transformers/fsdp2.py
    • Updated model_id from Qwen2.5-7B-Instruct to Qwen3-4B in evaluation and training datasets and model initialization.
  • cookbook/transformers/sp_fsdp_dense.py
    • Updated MODEL_ID from Qwen2.5-7B-Instruct to Qwen3-4B.
  • docs/source_en/Components/Advantage/GRPOAdvantage.md
    • Updated model_id from Qwen/Qwen2.5-7B-Instruct to ms://Qwen/Qwen3-4B for actor and sampler.
  • docs/source_en/Components/Advantage/RLOOAdvantage.md
    • Updated section title from 'Complete Training Example' to 'Training Example'.
    • Updated model_id from Qwen/Qwen2.5-7B-Instruct to ms://Qwen/Qwen3-4B for actor and sampler.
    • Added dataloader = ... line.
  • docs/source_en/Components/Data Format/Sampling.md
    • Updated model_id from Qwen/Qwen2.5-7B-Instruct to ms://Qwen/Qwen3-4B for sampler.
  • docs/source_en/Components/Dataset/Dataset.md
    • Updated model_id from Qwen/Qwen2.5-7B-Instruct to ms://Qwen/Qwen3-4B in set_template example.
  • docs/source_en/Components/Model/MegatronModel.md
    • Updated model_id from Qwen/Qwen2.5-7B-Instruct to ms://Qwen/Qwen3-4B in model initialization example.
  • docs/source_en/Components/Model/TransformersModel.md
    • Updated model_id from Qwen/Qwen2.5-7B-Instruct to ms://Qwen/Qwen3-4B in model initialization example.
  • docs/source_en/Components/Reward/Reward.md
    • Updated model_id from Qwen/Qwen2.5-7B-Instruct to ms://Qwen/Qwen3-4B for sampler.
  • docs/source_en/Components/Sampler/TorchSampler.md
    • Updated model_id from Qwen/Qwen2.5-7B-Instruct to ms://Qwen/Qwen3-4B for sampler.
  • docs/source_en/Components/Sampler/vLLMSampler.md
    • Updated model_id from Qwen/Qwen2.5-7B-Instruct to ms://Qwen/Qwen3-4B for sampler in two examples.
  • docs/source_en/Usage Guide/Quick-Start.md
    • Added new sections for 'Usage Patterns', 'Using Only Partial Components', 'Single GPU', 'torchrun', 'Ray Training', and 'Remote Training' with extensive code examples.
    • Updated model IDs in existing and new examples to Qwen/Qwen3-4B.
    • Updated the model list table with a new format and potentially new models.
  • docs/source_en/Usage Guide/Server and Client/Tinker-Compatible-Client.md
    • Updated SelfCognitionProcessor team name from twinkle team to ModelScope Team.
  • docs/source_en/Usage Guide/Server and Client/Twinkle-Client.md
    • Updated SelfCognitionProcessor team name from twinkle team to ModelScope Team.
  • docs/source_en/Usage Guide/Train-as-a-Service.md
    • Added a sample code block for using the TaaS training service, including init_tinker_client and ServiceClient usage.
    • Updated SelfCognitionProcessor team name from Twinkle Model to ModelScope Team.
  • docs/source_zh/使用指引/快速开始.md
    • Added new sections for '使用模式', '仅使用部分组件', '单GPU', 'torchrun', 'Ray训练', and '远程训练' with extensive code examples.
    • Updated model IDs in existing and new examples to Qwen/Qwen3-4B.
  • docs/source_zh/使用指引/训练服务.md
    • Added a sample code block for using the TaaS training service, including init_tinker_client and ServiceClient usage.
    • Updated SelfCognitionProcessor team name from Twinkle Model to ModelScope Team.
  • docs/source_zh/组件/优势/GRPOAdvantage.md
    • Updated model_id from Qwen/Qwen2.5-7B-Instruct to ms://Qwen/Qwen3-4B for actor and sampler.
  • docs/source_zh/组件/优势/RLOOAdvantage.md
    • Updated model_id from Qwen/Qwen2.5-7B-Instruct to ms://Qwen/Qwen3-4B for actor and sampler.
  • docs/source_zh/组件/奖励/Reward.md
    • Updated model_id from Qwen/Qwen2.5-7B-Instruct to ms://Qwen/Qwen3-4B for sampler.
  • docs/source_zh/组件/数据格式/Sampling.md
    • Updated model_id from Qwen/Qwen2.5-7B-Instruct to ms://Qwen/Qwen3-4B for sampler.
  • docs/source_zh/组件/数据集/Dataset.md
    • Updated model_id from Qwen/Qwen2.5-7B-Instruct to ms://Qwen/Qwen3-4B in set_template example.
  • docs/source_zh/组件/模型/MegatronModel.md
    • Updated model_id from Qwen/Qwen2.5-7B-Instruct to ms://Qwen/Qwen3-4B in model initialization example.
  • docs/source_zh/组件/模型/TransformersModel.md
    • Updated model_id from Qwen/Qwen2.5-7B-Instruct to ms://Qwen/Qwen3-4B in model initialization example.
  • docs/source_zh/组件/采样器/TorchSampler.md
    • Updated model_id from Qwen/Qwen2.5-7B-Instruct to ms://Qwen/Qwen3-4B for sampler.
  • docs/source_zh/组件/采样器/vLLMSampler.md
    • Updated model_id from Qwen/Qwen2.5-7B-Instruct to ms://Qwen/Qwen3-4B for sampler in two examples.
  • src/twinkle/data_format/init.py
    • Imported LossOutput in addition to ModelOutput.
  • src/twinkle/data_format/output.py
    • Modified OutputType to include float.
    • Updated ModelOutput fields (logits, loss) to be Optional and added an Optional[OutputType] field for logps.
    • Added a new LossOutput TypedDict with loss and num_tokens fields.
  • src/twinkle/infra/init.py
    • Added a check to get_device_placement to return a message if device_group is None.
    • Modified new_init to check os.environ.get('WORKER_NAME') is None in addition to remote_group is None for local initialization warning.
  • src/twinkle/loss/base.py
    • Updated the __call__ method signature to return LossOutput.
  • src/twinkle/loss/chunked_cross_entropy.py
    • Imported LossOutput.
    • Added a docstring """TODO untested code""".
    • Modified the backward method to return a LossOutput object.
  • src/twinkle/loss/cross_entropy.py
    • Imported LossOutput and selective_log_softmax.
    • Modified the __call__ method to return a LossOutput object.
  • src/twinkle/loss/grpo.py
    • Imported LossOutput instead of Trajectory.
    • Updated the __call__ method signature to remove the -> 'torch.Tensor' return type hint and removed the Returns section from its docstring.
    • Modified the __call__ method to return a LossOutput object.
  • src/twinkle/loss/mse.py
    • Imported LossOutput.
    • Modified the __call__ method to return a LossOutput object.
  • src/twinkle/loss/vocab_parallel_cross_entropy.py
    • Imported LossOutput.
    • Modified the __call__ method to return a LossOutput object.
  • src/twinkle/model/base.py
    • Updated return type hints for abstract methods (forward, forward_only, calculate_loss, backward, forward_backward, clip_grad_norm, step, zero_grad, lr_step, clip_grad_and_step, set_loss, set_optimizer, set_lr_scheduler, save, load, get_state_dict, apply_patch, add_metric, calculate_metric, add_adapter_to_model, set_template, set_processor, upload_to_hub) to be more specific or None.
  • src/twinkle/model/megatron/megatron.py
    • Imported selective_log_softmax.
    • Changed variable_seq_lengths default to False.
    • Modified post_loss_function to accept logps and use LossOutput structure.
    • Modified forward_step_func to calculate logps and pass them to post_loss_function.
    • Added logps to the collected outputs in forward_backward.
    • Modified clip_grad_norm to return 0 instead of pass.
    • Modified the return value of forward_backward to be a ModelOutput containing loss and logps.
  • src/twinkle/model/megatron/strategy/megatron.py
    • Modified gather_loss_for_cp to accept logps and include them in the returned dictionary.
  • src/twinkle/model/transformers/transformers.py
    • Imported selective_log_softmax.
    • Modified forward and forward_only to calculate logps and add them to outputs if labels are present.
    • Modified calculate_loss to correctly extract loss_value and counts from LossOutput.
    • Modified forward_backward to return outputs (which now includes loss and logps) instead of just loss.
    • Removed the return value from clip_grad_and_step.
  • src/twinkle/processor/base.py
    • Added a check for self.device_mesh is None in split_cp.
  • src/twinkle/reward/base.py
    • Updated the __call__ method signature to return List[float].
  • src/twinkle/server/main.py
    • Updated the default model-id in the quick start example from Qwen/Qwen2.5-7B-Instruct to Qwen/Qwen3-4B.
  • src/twinkle/server/tinker/common/compat_base.py
    • Modified _get_forward_output to accept logps and handle cases where logits might be None. It now uses logps if provided for calculating token_log_probs.
  • src/twinkle/server/tinker/common/datum.py
    • Modified input_feature_to_datum to call .cpu() on input_ids and labels_raw before converting to numpy arrays, ensuring they are on CPU.
  • src/twinkle/server/tinker/common/megatron_model.py
    • Modified forward_backward to correctly extract loss, logits_list, and logps from the ModelOutput returned by the super call. It then passes logits and logps to _get_forward_output.
    • Modified forward_only to extract logps and pass them to _get_forward_output.
  • src/twinkle/server/tinker/common/transformers_model.py
    • Modified forward_only and forward_backward to extract logps from outputs and pass them to _get_forward_output.
  • src/twinkle/utils/framework.py
    • Removed import torch (it's not directly used in gather_object after the change).
    • Added a None check for device_mesh in gather_object.
  • src/twinkle/utils/torch_utils.py
    • Added # noqa comments to except Exception blocks.
Using Gemini Code Assist

The full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips.

Invoking Gemini

You can request assistance from Gemini at any point by creating a comment using either /gemini <command> or @gemini-code-assist <command>. Below is a summary of the supported commands on the current page.

Feature Command Description
Code Review /gemini review Performs a code review for the current pull request in its current state.
Pull Request Summary /gemini summary Provides a summary of the current pull request in its current state.
Comment @gemini-code-assist Responds in comments when explicitly tagged, both in pull request comments and review comments.
Help /gemini help Displays a list of available commands.

Customization

To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a .gemini/ folder in the base of the repository. Detailed instructions can be found here.

Limitations & Feedback

Gemini Code Assist may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments. If you're interested in giving your feedback about your experience with Gemini Code Assist for Github and other Google products, sign up here.

You can also get AI-powered code generation, chat, as well as code reviews directly in the IDE at no cost with the Gemini Code Assist IDE Extension.

Footnotes

  1. Review the Privacy Notices, Generative AI Prohibited Use Policy, Terms of Service, and learn how to configure Gemini Code Assist in GitHub here. Gemini can make mistakes, so double check it and use code with caution.

# Conflicts:
#	src/twinkle/server/tinker/common/compat_base.py
#	src/twinkle/server/tinker/common/megatron_model.py
Copy link
Copy Markdown
Contributor

@gemini-code-assist gemini-code-assist bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request introduces a fix for log probabilities (logps) handling by refactoring the ModelOutput and LossOutput data structures and updating the model and loss calculation logic to correctly propagate logps. A significant portion of the changes involves updating documentation and examples to use newer models and reflect the API changes. I've found a couple of issues: some leftover debugging code in an example script that is now broken, and some inconsistencies in model IDs within the documentation.

I am having trouble creating individual review comments. Click here to see my feedback.

cookbook/megatron/tp.py (68-69)

high

This block of code seems to be leftover from debugging. The _temp variable is not used anywhere. Furthermore, the call to TwinkleCompatModelBase._get_forward_output is now incorrect as its signature has changed to include logps, and this call will raise a TypeError. This code should be removed.

docs/source_en/Usage Guide/Quick-Start.md (877)

medium

The model_id here and in other examples in this file (lines 880, 906, 914) is missing the ms:// prefix. Other documentation files and examples in this PR consistently use the ms:// prefix for ModelScope models (e.g., 'ms://Qwen/Qwen3-4B'). For consistency, the prefix should be added here as well.

model = TransformersModel(model_id='ms://Qwen/Qwen3-4B', remote_group='default', device_mesh=device_mesh)

@tastelikefeet tastelikefeet merged commit cdb8342 into modelscope:main Mar 1, 2026
1 of 3 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants